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SUMMARY 

We propose a class of spherical wavelet bases for the analysis of geophysical models and for the 
tomographic inversion of global seismic data. Its multiresolution character allows for modeling 
with an effective spatial resolution that varies with position within the Earth. Our procedure is 
numerically efficient and can be implemented with parallel computing. We discuss two pos- 
sible types of discrete wavelet transforms in the angular dimension of the cubed sphere. We 
discuss benefits and drawbacks of these constructions and apply them to analyze the informa- 
tion present in two published seismic wavespeed models of the mantle, for the statistics and 
power of wavelet coefficients across scales. The localization and sparsity properties of wavelet 
bases allow finding a sparse solution to inverse problems by iterative minimization of a combi- 
nation of the £2 norm of data fit and the l\ norm on the wavelet coefficients. By validation with 
realistic synthetic experiments we illustrate the likely gains of our new approach in future in- 
versions of finite-frequency seismic data and show its readiness for global seismic tomography. 
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1 INTRODUCTION 

As long as tomographic Earth models remain the solutions to 
mixed-determined inverse problems ( Nolet 1987 2008 ) there will 
be disagreement over the precise location, shape, form and ampli- 
tude of lateral and radial anomalies in seismic wavespeed that ex- 
ist within the Earth; there will be attempts to derive the best-fitting 
mean structure (e.g. |Becker & B oschi 20 02}, and the needed e fforts 
to validate them (e.g. |Qin et al.|2009[|Bozdag & Trampert|2010| >. At 
the same time, patterns, second-order structure and correlations be- 
tween and within models will continue to be sought with the goal 
of characterizing seismic he terogeneities (e.g. |Passier & Snieder] 
[19951 [Bergeron et al.|[T999l |Hernlund & Houser||2008| ) or relat- 
ing them to geochemical (e.g. |Gurnis||1986| ), tectoni c (e.g. |Yuen 



et al.| 2002, Becker et al. 2006), or geodynamical (e.g. | Jordan et al. 
1993l|Piromallo et al.|2001||Houser & Williams |2009| > processes. It 



has also become clear that model characteristics such as the power 
spectrum of tomographic anomalies ( Chevrot et al. 1998a b ; Boschi 
|& Dz iewonski 1999 ) may teach us as much about the modeler's 
choices of parameterization and regularization as about the model, 
without imparting much information about the physical or statisti- 
cal nature of our complex, physically and chemically differentiated 
system Earth — yet the latter should be our target. In recent work 
questions about the size and scale distribution of Earth structure 
have more fruitfully been addressed by direct inference from the 



data themselves (e.g. Hedlin & Shearer||2000[ |Margerin &" Nolet 
|2003l|Becker et al.|2007||Garcia et al.|2009| > without the detour of 
first deriving a global three-dimensional model and analyzing that. 

By no means are the analysis and representation of volumet- 
ric properties the sole purview of seismology or geodynamics, and 
thus it is not surprising that there is a large literature on the subject 
in virtually every area of scientific inquiry (e.g. medical imaging, 
astronomy, cosmology, computer graphics, image processing, ...). 
In this context much has come to be expected of the special powers 
of wavelets, with their built-in discriminating sensitivity to struc- 
ture in the space and spatial frequency domains ( Daubechies 1992, 
|Strang & Nguyen|1997l [Mallat 2008). Notwithstanding a continued 



interest and clear and present progress in the field (e. g. |Fou foula- 



Georgiou & Kumar|[l994l |Klees & Haa gmans 2000 Freed en&l 
Michel 2004bl |Oliver|2009| >, the use of wavelets is still no matter 



of routine in the geosciences, beyond applications in one and two 
Cartesian dimensions. This despite, or perhaps because, there being 
a wealth of available constructions relevant for global geophysics, 
in other words: on the sphere (e.g.|Schroder & Sweldens 1995 |Nar-| 
Icowich & Ward|1996l[Antoine et al.|20021|Holschneider et al.|2003[ 
Freeden & Michel 2004a ; Hemmat et al. 2005 ; Schmidt et al. 2006 ; 
|Starck et al.|2006l|McEwen et al.|2007[|Wiaux et al.|2007[|Lessig &| 
Fiume 2008}, if not on the ball. Indeed, inasmuch as they involve 
the analysis of cosmological data or computer-generated images, 
the above studies are mostly concerned with surfaces, not volumes. 



2 Simons, Loris, Nolet, Daubechies & others 



In seismology, Chiao & Kuo (2001 ) were, to our knowledge, 
the first to develop a "biorthogonal-Haar" wavelet lifting scheme 
(Schroder & Sweldens 1995 ) for a triangular surface tesselation of 
the sphere suitable for multiscale global tomography. Later, these 
same authors formed a (biorthogonal) spline basis for a Cartesian 
cube useful in exploration geophysics (Chiao & Liang 2003) and 
for regional studies (Hung et al. 2010|>. Finally, |Chevrot & Zhao 
(2007 ) constructed a three-dimensional (orthogonal) Haar basis on 
an equidistant geographical grid which was also used for a regional 
inversion. To this date, a truly three-dimensional wavelet basis on 
the ball with practical utility in the geosciences has been lacking. 

Whatever the role that wavelets will play in it, the future 
of global seismic tomography will continue to involve massive 
amounts of heterogeneous data spanning a range of resolutions, 
from the travel times reported by the global networks to the wave- 
forms of portable deployments, with strong regional concentra- 
tions of station coverage in areas such as Japan, the United States, 
and Europe, supplemented with sparse or non-existent networks 
in less densely populated or oceanic regions. It is also clear that 
finite-frequency kernels, which allow for the correct volumetric 
sensitivity-based weighting of the measurements in distinct fre- 
quency bands, are here to stay, whichever the various ways in which 
they are calculated (see Nolet 2008 and references therein). Ac- 
counting for finite-frequency sensitivity requires an effective over- 
parameterization if one wishes to exploit the extra resolution of- 
fered by the spatial variations in their sensitivity, something for 
which wavelet seem ideally suited also. 

This paper documents some of the extensive prospective work 
that we have done in preparation for realistic wavelet-based global 
seismic inversions. We wish to share the most important insights 
that we have gained through these studies. Our goal remains 
to ensure that there exist performant and efficiently calculable, 
flexible wavelet methods on the three-dimensional ball, to fulfill 
the promise of multire solution analysis (Mallat 1989, Jawerth & 
Sweldens 1994 ) in global seismology. Not just for the representa- 
tion and analysis of seismic models after the fact, but rather for 
their determination, as an integral part of a parsimonious param- 
eterization of the inverse problem — of the sensitivity matrix, of 
the model space, or both. Although there is no objective guarantee 
that Nature, or the interior of the Earth in particular, is parsimo- 
nious in character, sparsity is worth striving for. By simplifying a 
tomographic image to contain a relatively small number of recog- 
nizable objects we facilitate interpretation ( Sambridge et aL 2006|). 
Moreover, such models can be more accurate than their data ( Gauch 
2003 ), a point not to be overlooked in view of the large relative er- 
rors of seismic delay times and amplitudes. 

By flexibility we mean the ability to substitute a particular 
wavelet design for another in any of the three coordinate directions; 
by efficiency we intend to avoid the tedious case-by-case deriva- 
tion of different bases and calculation methods. By performance 
we target the ability to capture the unknown model by explaining 
the data (in an I2 sense) with a minimum of wavelet and scaling 
coefficients, both where the data require the solution to be smooth 
and where they necessitate the presence of sharp contrasts. It is 
of course in this capacity also (e.g. Donoho & Johnstone 1994 
1995 ) that wavelets will distinguish themselves from any other 
traditional method of seismic inversion. As to sparsity, it is both 
numerically and philosophically attractive ( Const able et al.| 1987 ) 
and physically plausible or at least testable that the interior of the 
Earth should be sparse when expressed in a wavelet basis. Fortu- 
nately, for most large underdetermined systems of linear equations 
the minimal i\ norm solution is also the sparsest ( [Candes et al.| 




Figure 1. Aerial view showing our first adaptation of the cubed sphere of 
Ronchi et al. (1996). Of the front-facing four of the in total six "chunks", 
one is gridded to reveal its 2 2 N distinct surface elements (N = 4). 



2006 ; |Donoho||2006|), f or which (fast, iterative) algorithms exist 
( [Daubechies et al.|2004||Loris|2009| ). Elsewhere, [Loris et al.| ( [2007l 



|2010| ) and |Gholamr & Siahkoohi (2010 ) explored the suitability of 
sparsity- seeking thresholded wavelet-based inversion approaches 
in two-dimensional (2D) and three-dimensional (3D) Cartesian set- 
tings relevant to seismic tomography. All of the above issues will 
again be the guiding principles behind the new spherical wavelet 
construction(s) that we present in this paper. 

This paper is organized as follows. In Section [2] we de- 
velop a first class of wavelet constructions on the sphere via a 
well-known Cartesian-to- spherical mapping known as the "cubed 
sphere" ( [Ronchi et al.|1996[[Komatitsch & Tromp|2002| ). As this 
surface tesselation has "seams" separating each of six subdivisions 
or "chunks", we acknowledge these boundaries in the construction 
by using so-called "wavelets on the interval". These revert to the 
classical compactly supported (bi)orthogonal Cartesian construc- 
tions of |Daubechiesl < |1988t and |Cohen et al.| ( |1992| ) in the interior 
domains but receive special consideration on the edges as put forth 
by |Cohen et al.| ( [T993] ). In Section [3] we study the sparsity of two 
global seismic tomographic Earth models by thresholded recon- 
structions of their wavelet transforms applied to the angular coordi- 
nates of the cubed sphere, at constant depth intervals, and consider- 
ing a variety of goodness-of-fit criteria. We furthermore character- 
ize, in Section[4] the scale lengths of heterogeneity in these models 
by reporting the relative contributions of their wavelet and scal- 
ing coefficients in the expansion. In Section [5] we review the main 
approach to obtain sparse wavelet-based solutions to the inverse 
problem of seismic tomography, which were previously discussed 
in a Cartesian framework by |Loris et al.|(2007 2010 ). As using the 
initial construction with such schemes led to undesirable artifacts 
at the edges between the chunks, we derive a second wavelet con- 
struction in Section [6] which appears to be free of such artifacts, as 
we show using realistic synthetic tests in Section [7] While in this 
paper we focus on the angular part of the cubed sphere we gener- 
alize our construction to including the case of the ball and provide 
an outlook for further research in global seismic tomography in the 
concluding Section [8] 
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Figure 2. Geometry, nomenclature, and numbering of the six faces of our first adaptation of the cubed sphere of Ronchi et al. (1996} in a two-dimensional 
"unfolded" view. Rendered is the Earth's topography from the model ETOP05, courtesy of NOAA's National Geophysical Data Center. The projection was 
obtained by spherical-harmonic expansion of the coefficients from this model (Georg Wenzel, pers. comm.) truncated at degree and order L = 2 iY+1 , 
evaluated at the 6 x 2 2N cubed-sphere grid points 77, for N = 8. Minimum, median, and maximum values in this approximation are shown in the legend. 



2 A FIRST CONSTRUCTION 



As Ronchi et al. (1996]), we define the coordinate quartet (£, 77, r, k) 
for each of the k = 1 — ► 6 chunks. The angular coordinates 
— 7r/4 < £,77 < 7r/4 and the radial coordinate r are mapped to 
the usual Cartesian triplet (x,y,z) using the transformation 
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whereby s = y/l + tan 2 £ + tan 2 77. The inverse mapping is ob- 
tained, for t = max(|x|, \y\, \z\), 



[atan(z/y), atan(— x/y), 1 ] 
[atan(y/x), atan(— z/x), 2 ] 
[atan(— y/z), atan(x/z), 3] 
[atan(x/z), atan(— y/z), 4] 
[atan(— z/x), atan(?y/x), 5 ] 
[atan(— x/y), atan(^/y), 6] 



t=-y, 
t — —x, 

£ z, 
t -z, 

£ = X, 



(2) 



whereby r = ^ x 2 + y 2 + z 2 . This parametrization is non- smooth 
across the edges separating the chunks. The above formulas corre- 
spond to the drawing in Fig. [T] where only one of the chunk faces 
is gridded to reveal the angular coordinate lines (£, 77) at a resolu- 
tion that divides this face into 2 4 x 2 4 distinct surface elements. 
Throughout this paper we will quote N as the angular resolution 
level of our cubed sphere, which implies that it has 6 x 2 2N such 
elements, with typical tomography grids having N = 7. 



In principle there are many possibilities to choose the surficial 
coordinates (£, 77) in each chunk. We picked ours so as to minimize 
the splitting of continents over more than one chunk. Our choice 
differs from the canonical version of Ro nchi et"aL] (fl996 ) by a rigid 
rotation of the coordinate system, as can be seen by comparing our 
Fig.|2]with their Figs 15-16. The Euler angles used in our construc- 
tion are a = 0.0339, f3 = 1.1705, and 7 = 1.1909, respectively. 
It is important to note that within a chunk £ and 77 are not spherical 
coordinates; a shift in £ (with 77 fixed) or in 77 (with £ fixed) does 
not correspond to a rotation on the sphere. This is apparent from 
the pinching of coordinate lines in Fig.[T] 

Armed with the coordinate conversions of eqs ([TJ and ([2} we 
are able to regard the problem of designing a wavelet transform for 
the sphere as simply requiring the selection of a certain Cartesian 
wavelet transform which is mapped to and from the sphere. Such 
an approach is philosophically related to those involving stereo- 
graphic projection (Antoine & Vandergheynst 1999, Antoine et al. 
2002 1 |Wiaux et al.|2005] >7 though the fundamental domain of our 
transform remains a single chunk. Within each such chunk, the sur- 
face Jacobian of our mapping is given by the smoothly varying 



Jfarj) = (1 +tan 2 0(l +tan 2 77)/s 3 , V2/2 < J < 1. 
For each of the chunks then the area is given by 

~6~" 



tt/4 



(3) 



(4) 



r/4 



Without this being a uniform mapping, one of the main advantages 
of the chosen coordinate system is thus that the meshes defined on 
each region span the surface of the sphere with an almost constant 
spatial resolution, as noted by Ron chi et"aT] fl996). 
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Figure 3. Wavelet and scaling functions of the D4 construction in the angular coordinates of the cubed sphere, at various scales. The positions of the coefficients 
belonging to the functions in the lettered panels are shown in the diagram in the top left. The scaling function (a), which is averaging in nature, captures what 
remains to be explained after the breakdown into wavelets down to scale 4 is complete. Each of the wavelets, which pick up detailed, derivative, structure, 
is sensitive in a particular direction: to £ in (b), to rj in (c), or diagonally in (d). In the interior domain, away from the edges where boundary functions (not 
shown) live, the patterns repeat exactly, with the footprint at each successive scale half that of the preceding scale. The diagonally sensitive wavelet at scale 2 
is not shown. Every function shown is orthonormal in (£, 77) and their inner products with respect to every other one vanish. 
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Figure 4. Wavelet and scaling functions arising from the CDF 4-2 construction in the angular coordinates of the cubed sphere, with their scale levels indicated. 
The layout is identical to that of Fig.[3] As opposed to the D4 wavelets, the CDF 4-2 construction is biorthogonal, which renders every shown synthesis function 
orthogonal in (£, 77) to its dual, which is used for analysis; none of the dual functions are shown. Unlike the D4 functions the CDF 4-2 have mirror symmetry. 



Ignoring any and all such distortions we are able to unlock 
the power of popular Cartesian wavelet constructions, of which 
we choose the two best known: the orthogonal construction of 
( |1988| ) and the biorthogonal construction of |Cohen] 



Daubechies 



et al. (19921. Both of these lead to compactly supported wavelets 



and scaling functions, though only the biorthogonal ones can be 
(anti) symmetric (except for Haar). Examples of scaling functions 
and wavelets at scales of decreasing dominant wavelength are 
shown in Fig.[3]for the four-tap Daubechies basis (D4) and in Fig. [4] 
for the Cohen-Daubechies-Feauveau family with four and two van- 
ishing moments (CDF 4-2) in analysis and synthesis, respectively. 

The literature on Cartesian wavelet analysis is vast, and it is 
not our intention to repeat any of it here. Most useful for the prac- 
ticing geophysicist will perhaps be the treatises by |Mallat| {2008) 
and Strang ^&~Nguyen|fl997| >; texts focused on algorithms are Press 
|et al.| ( |1992t and, in particular, [Jensen & la Cour-Harbo| ( |200l| ). All 



of the computer code required to reproduce the figures and conduct 
the analyses presented in this paper is molded after these general 
references and will be available from the authors. 

Two aspects of wavelet analysis bear specific mentioning here. 
The first intricacy is how we treat the seams between the chunks. 
In agreement with Cohen et al. (1993) the argument is easily made 
that neither ignoring the seams nor periodization or reflection are 
viable options, as each of these leads to artifacts in the represen- 
tation. We thus follow their suggestion to the letter and construct 
a multire solution basis requiring 2 2N wavelet and scaling coeffi- 
cients for each of the chunk faces having 2 2N surface elements. 
For this we switch to special boundary filters at each of the edges, 
and apply preconditioners to the data prior to transformation in or- 
der to guarantee the usual polynomial cancellation throughout the 
closed rectangular interval. The acknowledgment of the edges in 
this way is the hallmark of the wavelet construction in this section 
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Figure 5. Wavelet and scaling coefficients (top), space-domain reconstructions after thresholding (middle), and "signed" histograms (bottom) of the wavelet 
and scaling coefficients of the "African" (sixth) face of the cubed-sphere version of the Earth's topography first shown in Fig. [2] We have used the precondi- 
tioned interval wavelet transforms on the faces of the cubed sphere, as described in the text. All coefficients were hard-thresholded at the 85 th percentile level, 
retaining only the 15% largest coefficients by absolute value. In the top row, the locations of zeroed coefficients are rendered white; those are also captured by 
the white bars in the histograms. The root mean squared (rms) error of the reconstruction after thresholding is indicated as a percentage of the signal rms. Tick 
marks on the color bars identify the 5 th , 50 th and 95 th percentile of the coefficients or the spatial reconstructions after thresholding, respectively. Interior 
ticks on the histograms roughly coincide with these same percentiles as applied to either the positive and negative coefficients when expressed on a logarithmic 
scale. Histograms for the positive coefficients point up and have ordinates in positive percentages, histograms for the negative coefficients point down and have 
ordinates in negative percentages; these percentages are with respect to the total number of positive and negative coefficients. The blue and red shaded areas 
of the histograms reflect the coefficients retained at the global 85 th thresholding level. 



here (which we call the First Construction). This is as easily done 
for the orthonormal as for the biorthogonal constructions, though 
we have limited the implementation and illustration of this proce- 
dure, in Fig. [5] to the compactly supported two-tap (Haar), four-tap, 
and six-tap orthonormal families (D2, D4 and D6). 

Before we discuss Fig. [5] in any more detail we should intro- 
duce the second important feature that renders wavelet transforms 
in general useful for the analysis and representation of (geophysi- 
cal) data. This second topic is the idea of thresholding, or shrink- 
age. In many applications the wavelet transformation amounts to 



a projection under which many of the expansion coefficients are 
very small: so small that we might as well throw them away; the 
resulting reconstruction will still be close to the original (Donoho 
|& Joh nstone 1994 ). Intuitively, the "best" wavelet basis that we can 
select to represent our data is the one that yields the most near-zero 
coefficients. When these are replaced by zeroes prior to reconstruc- 
tion, as under the definition of hard thresholding (Mallat 2008|, we 
obtain highly compressed versions of the data at hand, with only 
negligible degradation. 
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Fig. [5] explores the effects of thresholding, coefficient statis- 
tics, and reconstruction errors for a model of terrestrial topography, 
a general proxy for the length scales of heterogeneities to be found 
not only at the surface, but also in the interior of the Earth. We fo- 
cus on the sixth, or "African" chunk of our cubed sphere, and use 
the D2, D4 and D6 wavelet bases (on the interval, with precondi- 
tioning). The top row uses the (common) conventions introduced in 
Fig.[3]in plotting the wavelet and scaling coefficients in each of the 
basis after (hard) thresholding them such that only the coefficients 
larger than their value at the 85 th percentile level survive. The co- 
efficients that have now effectively been zeroed out are left white 
in these top three panels. The middle series of panels of Fig.[5]plots 
the spatial reconstruction after thresholding at this level; the root 
mean squared (rms) error of these reconstructions are quoted as a 
percentage of the original root mean squared signal strength. The 
thresholded wavelet transforms allow us to discard, as in these ex- 
amples, 85% of the numbers required to make a map of African 
topography in the cubed- sphere pixel basis: the percentage error 
committed is only 5.8%, 4.9% and 6.7% according to this energy 
criterion in the D2, D4 and D6 bases, respectively. From the map 
views it is clear that despite the relatively small error, the D2 ba- 
sis leads to unsightly block artifacts in the reconstruction, which 
are largely avoided in the smoother and more oscillatory D4 and 
D6 bases. A view of the coefficient statistics is presented in the 
lowermost three panels of Fig. [5] The coefficients are roughly log- 
normally distributed, which helps explain the success of the thresh- 
olded reconstruction approach. While the example here was strictly 
designed to illustrate our algorithms and procedures, we conclude 
that the D4 basis is a good candidate for geophysical data repre- 
sentation, provided the edges between cubed- sphere chunks have 
properly been accounted for. 



3 EARTH MODEL SPARSITY 

In tomographic studies, either as an integral part of the inversion 
of after a solution has been found, the target model is parameter- 
ized by local or global basis func tions ([N olet 2008). Blocks, cells, 
nodes, or voxels (e.g.,|Aki et al.||1977[ |Zhang & T animoto 1993| 



Spakman & Bij waard|200T]jSi mons et a l.|2002[|Debayle & Sam- 



bridge 2004 ; Nolet & Montelli 2005) are all strictly local functions. 
Cubic B-splines (e .g., |Wang & Dahlen||l 995 [| Wang et al.|[T998] 



Boschi et al.|2004|) or wavelets (e.g., |Chiao & Kuo|20011|Chevrot| 



& Zhao|2007[|Loris et al.| 2007 ) are more generally localized func 



tions. Spherical ha rmonics (e.g., [D ziewonski 1984, Wo odhouse & 



Dziewonsla| |1984[ |Ekstrom et al.||1997[ |Trampert & Woodhouse 



1996 2001) are ideally localized spectrally but have global support 
( |Freeden & Michel 1999). An intermediate approach that combines 
spatial and spectral localization was developed using spherical har- 
monic splines by Amirbekyan & Michel (2008) and Amirbekyan 
et al. (2008), but this produces an inverse problem that scales with 
the square of the number of data collected, rendering it impractical 
for the large-scale tomographic systems of the future. 

In preparing for the study of the suitability for solution of such 
massive inverse problems of the wavelet transforms that we intro- 
duced in the previous section, we take a detour in this section by 
addressing the question: is the Earth sparse in a wavelet basis? Of 
course we will never be able to answer this question with any de- 
gree of certainty, but we can investigate, at the very least, whether 
Earth models are sparse in such bases. Because they are, as we 
shall see, we will gain from parameterizing the inversion for fu- 
ture Earth models using the spherical wavelets developed in this 



paper. The expected gains are with respect to numerical efficiency 
but also in terms of regularization. Since wavelets are not global 
functions (ours, as can be seen from Figs. [3] and [4] are compactly 
supported, i.e. vanishing outside their scale-dependent footprint), 
and yet, (bi)orthogonal, the function basis will not dictate the model 
structure in areas of poor data coverage as is the case with spherical 
harmonics (Trampert & Snieder 1996 ; Boschi & Dziewonski 1999 ; 
Amirbekyan et al. 2008). Moreover, though this depends on pre- 
cisely what wavelet construction is being used, they are capable of 
representing both smoothly varying functions as well as preserving 
sharp edges, and their natural multi-resolution nesting will allow 
for the model resolution to vary spatially, as required by the data. 

There is, however, another reason to find out how seismic 
Earth models behave under wavelet transformation: because it en- 
ables us to study the relative importance of model heterogeneity at 
different scale lengths, which is important to help constrain geo- 
chemical and geodynamical models and interpretations of Earth 
structure. The Earth is heterogeneous at all scales but not likely 
everywhere to the same degree; thermally induced deviations from 
the radial average one-dimensional Earth structure are expected to 
be smoother and with longer wavelengths than those due to compo- 
sitional variations; the presence of distinct scatterers further com- 
plicates this picture (Shearer & Earle 2004). In short, we are inter- 
ested in obtaining a power spectral density of sorts ( [Chevrot et al.| 
1998a b, Boschi & Dziewonski 1999 ), as applied to seismic struc- 
ture and how it may vary spatially within the Earth. As we are not 
in the position to return to direct measurements of the energy distri- 
bution Lofheterogen^ 2000 ; Margerin & Nolet 
[2003l|Becker et al.|2007[|Garcia et al.|2009) we will instead study 
the sizes and scales within reported tomographic Earth models. 

From the plethora of seismic Earth models that are available to 
study, we select two mantle models: one by |Montelli et al.| {2006) 
of compressional (P) wavespeed heterogeneity and another by |Rit-| 
|sema et ah] ( |2010| ) of shear (S) wavespeed perturbations. Neither 
model has much at all in common with the other in terms of its 
construction, and from the point of view of parameterization, Mon- 
telli' s model has a tetrahedral grid underlying it, whereas Ritsema's 
expands wavespeed anomalies in a spherical harmonic basis com- 
plete to degree and order 40. At a depth of about 400 km, Figs [6^ 
and[6f show P-wave ( Montelli et al. 2006) and S- wave ( Ritsema 
|et al. |2010| anomalies from the average at that depth. Montelli 's 
model was interpolated (from the tetrahedral grid on which it was 
built) onto the 6 x 2^ (N = 7) points of our cubed sphere, 
whereas Ritsema's model was evaluated (from the listed spher- 
ical harmonic and radial spline expansion coefficients) at these 
same points. Subsequently, the wavelet transform in the D4 basis 
(with special boundary filters and after preconditioning, and up un- 
til scale J — 3) was thresholded and the results re-expanded to 
the spatial grid, identically as we did for the topography in Fig. [5] 
The results for specific values of the thresholding (quoted as the 
percentile of the original wavelet coefficients) are shown in Figs [6k 
and[^ for the 50 th , Figs^ and|6ji for the 85 th , Figs[6}l and]§ 
for the 95 th percentile, respectively. At each level of threshold- 
ing the number of nonzero wavelet/scaling expansion coefficients 
is quoted: at 0% thresholding this number is identical to the number 
of pixels in the surficial cubed sphere being plotted. 

As we have written before, the wavelet transformation does 
not change the number of pieces of information with which it is pre- 
sented. Rather, it dramatically redistributes information in a man- 
ner that allows us to simply omit those coefficients with low values, 
with limited degradation to the field being represented. This recon- 
struction "error" can be visually assessed from the pictures; it is 
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Figure 6. Sparsity and reconstruction stability of two global seismic wavespeed models under incremental hard thresholding of their wavelet and scaling 
coefficients using the preconditioned edge-cognizant D4 wavelet basis I Daubechies 1988 Cohen et al. 1993) in the angular coordinates of the cubed sphere, as 
developed in this paper, (a-e) Results for the P-wave seismic model of Montelli et al. ( 2006) and (f-j) for the S-wave seismic model of Ritsema et al. ( 2010 1, 
at the same depth of 406 km below the surface of the Earth, for cubed spheres with 6 x 2 2N elements (N = 7), and to a 2 J dyadic subdivision (J = 3). 
As a function of the percentage of the coefficients that are being thresholded, and relatively to the original unthresholded values, the bottom panels quote the 
spatial £2 norms of the reconstruction error (in black), the total variation norms of the reconstructed images in the space domain (in red), and the i\ norms of 
the coefficients that remain (in blue). The values obtained for the cases shown in map view are shown as filled circles on these graphs, and the corresponding 
metrics in the D2, D4 and D6 bases are tabulated in Table[T] The reconstructions remain faithful to the originals even at elevated levels of thresholding. 
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depth thresholding 



|Montelli etaT|(2006j 

relative £2 error norm (% ) 



|Ritsema etal.|{2010| 

relative £2 error norm (% ) 



(km) 


percentile 


D2 


D4 


D6 


D2 


D4 


D6 


203 


50 


1.816 


0.808 


3.025 


1.014 


0.236 


0.229 




85 


9.212 


4.653 


6.324 


5.028 


1.360 


0.722 




95 


18.721 


11.214 


11.456 


10.073 


4.351 


3.172 


406 


50 


2.294 


1.107 


3.983 


1.267 


0.311 


0.297 




85 


10.559 


5.757 


7.701 


6.182 


1.786 


0.968 




95 


20.689 


13.231 


13.481 


12.393 


5.717 


4.125 


609 


50 


2.661 


1.244 


3.499 


1.562 


0.397 


0.393 




85 


11.471 


6.419 


7.775 


7.428 


2.211 


1.230 




95 


21.622 


14.145 


14.384 


14.589 


7.121 


5.162 


1015 


50 


3.099 


1.311 


3.884 


2.083 


0.533 


0.531 




85 


12.727 


6.896 


8.440 


9.517 


2.775 


1.592 




95 


23.296 


15.107 


15.139 


18.621 


9.009 


6.462 


2009 


50 


1.995 


0.461 


1.799 


1.582 


0.379 


0.372 




85 


8.890 


3.662 


5.174 


7.363 


2.021 


1.145 




95 


16.946 


9.208 


9.104 


14.527 


6.572 


4.695 



Table 1. A companion to Fig. [6] this table lists the £2 error norms, relative to the original, of the reconstructions of the P-wave speed model of Montel iTet al.| 
( 2006 1 and the S-wave model of Ritsema et al. ( 2010 1 under hard wavelet thresholding in the angular coordinates. See Fig.|6]and text for more details. 



also quoted next to each panel as the percentage of the root mean 
squared error between the original and the reconstruction, normal- 
ized by the root mean squared value of the original in the origi- 
nal pixel representation, in percent. Specifically, we calculate and 
quote the ratio of £2 norms in the pixel-basis model vector m, 



100 x ||m-S{T[A(m)]}|| 2 /||m|| 2 



(5) 



which, in the lower-right annotations is called the "% error norm". 
We have written A for any of the wavelet (analysis) transforms that 
are used and S (synthesis) for their inverses, and T for the "hard" 
thresholding (Mallat 2008 ) of the wavelet and scaling coefficients. 

In Figs [6^ and [6], this same misfit quantity ([5]) is represented 
as a black line relevant to the left ordinate labeled "^2 error norm", 
which shows its behavior at 1% intervals of thresholding; the filled 
black circles correspond to the special cases shown in the map view. 
Only after about 80% of the coefficients have been thresholded 
does the error rise above single-digit percentage levels, but after 
that, the degradation is swift and inexorable. The blue curves in 
Figs|6^ and[6| show another measure of misfit relevant in this con- 
text, namely the ratio of the £1 norms of the thresholded wavelet 
coefficients compared to the original ones, in percent, or 

100 x ||T[A(m)] H,/ IIACm)^ . (6) 

As we can see from the figure the £2 ratios (5} in the black curves 
(and the left ordinate) evolve roughly symmetrically to the £\ ra- 
tios {6} in the blue curves (and the right ordinate), though evidently 
their range is different. 

Finally, a third measure that is being plotted as the red curve 
is the "total variation" norm ratio, in percent, namely 

100 x IIVSm^mjUII./HVmll,, (7) 

whereby || Vm^ is the sum over all voxels of the length of the 
local gradient of m. By this measure, which is popular in im- 
age r estoration applications (|Rudin et al.|1992[|Dobson & Santosa 
1996, Chambol le & Lions|1997| ), the quality of the reconstruction 
stays very high even at very elevated levels of thresholding; we note 
that its behavior is not monotonic and may exceed 100%. 

As with terrestrial topography in Fig. [5] we conducted all of 



the experiments on the seismic models that are presented in Fig. [6] 
in the D2, D4 and D6 wavelet bases. A summary of the £2 error 
norm ratios as a function of thresholding levels for each of those 
bases is presented in Table [T] On the strength of its behavior under 
the criteria (5}-(7} and upon visual inspection of the results, we 
conclude that the D4 basis remains a very appropriate choice for 
the efficient representation of seismic models. To this choice we 
adhere in the geophysically motivated study of mantle structure in 
those same models which follows below. 



4 TOMOGRAPHIC MODEL STRUCTURE 

There is much geophysical interest in tying seismic observations 
of mantle structure to models incorporating geodynamic modeling 
and mineral physics observations (e.g. Jordan et al. 1993, Karason 
|& van der HTTsT|[2000l |Becker & Boschi||2002| |Bull et al.|2009) . 
Our study is an attempt to provide a flexible, quantitative, mul- 
tiresolution framework for such analyses that may add to the more 
traditional power- spectral (e.g. Becker & Boschi 2002, Houser & 
Williams 2009 , Schubert h et al.|2009| > and statistical analyses (e.g. 
Hern lund & Houser|2008| ). In obliterating the phase of the anoma- 



lies, the former line of inquiry largely loses the relative spatial loca- 
tion of seismic structure, while the latter type of study is no longer 
sensitive to its scale and wavelength dependence. While in this pa- 
per we do not explicitly study the radial correlation of mantle struc- 
ture ( [Puster et al.|1995[|van der Hilst & Karason|1999| >, the analysis 
below readily lends itself to adaptation in the third dimension: our 
study is thus as much an initial exploration into the richness of the 
wavelet transform as a way of characterizing terrestrial heterogene- 
ity as an encouragement to further study. 

The first breakdown is as a function of depth and by scale 
of the D4 decomposition, as shown in Fig. [7] To aid in the inter- 
pretation we remind the reader of the dominant wavelengths that 
are represented at a specific scale by referring to Fig. [3] where of 
course it should be noted that the area of the panels decreases with 
the square of the depth in the Earth. 

The main observations relevant to both the Montelli et al. 
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Figure 7. Scale lengths of seismic heterogeneity as a function of depth in the Earth, to the core-mantle boundary (CMB), obtained from the complete angular 
expansion in the D4 wavelet basis of (a-b) the P-wave speed model of Montelli et al. ( 2006) and (c-d) the S-wave speed model of Ritsema et al. ( 2010}. See 
Fig.^for the wavelet and scaling functions and Fig. [6]for the seismic models: all calculations are with reference to cubed spheres with 6 x elements 
(N = 7), and to a 2 J dyadic subdivision (J = 4). Panels a and c show the maximum absolute values of the wavelet or scaling coefficients (wavs and seals 
in the legend, respectively) at the scales quoted, differentiated by color. The scaling coefficients at the fourth scale have the largest values: at all depths the 
maximum at this scale and the overall maximum (not shown) coincide. Panels b and d show the proportion (in %) of the contribution to the overall £2 norm of 
the seismic models at every depth by the ensemble of the coefficients at each of the scales. Ritsema' s model has much more structure in the top 410 km of the 
Earth (not shown because of the axis truncation is a peak with a value of 137.2 centered at 135 km) compared to the bottom 1000 km, as opposed to Montelli's 
model which has a more uniform distribution of heterogeneity. Both models are characterized by minima of seismic structure at mid-mantle depths. 



(2006 ) and the |Ritsema et al.| ( |2010| ) models are that seis- 
mic wavespeed heterogeneity has a dominantly "red spectrum" 
jCheyrot et aT]|1998a|b[ |Boschi & Dziewonski|[T999] ). Figs. 
and[7J; show the maximum absolute values of the wavelet and scal- 
ing coefficients at each of the four scales in the D4 decomposition, 
as a function of depth. The scaling functions at scale 4 (denoted 
"seals 4" in the legend; these are depicted in Fig. [3^) require the 
largest expansion coefficients; the maxima of the coefficients cor- 
responding to the wavelets at scale 4 ("wavs 4", see Figs[3]3-d) are 
only about half as large; those at scale 3 ("wavs 3", see Figs[5£-g) 
peak at about half that; and so on. While noting that the Montelli 
model has peak amplitudes that are about half as large as the ones in 
the Ritsema model, in both models the overall largest values are in 
the lithosphere, which encompass the crust and shallowmost man- 
tle down to about 250 km. The upper mantle (down to 660 km) 
and the transition zone (410-660 km) in particular are charac- 
terized by strong maxima that fluctuate with depth. Both seismic 
models have a somewhat different take on this measure of man- 
tle structure: the maxima in the Ritsema model (Fig. [7];) are more 
oscillatory with depth and have a strong peak around the 660 km 
mantle discontinuity which is broader than the corresponding one 
in the Montelli model (Fig. [7^). Each of the curves in Figs. [7^ 
and[7J: decays sharply with increasing depth in the lower mantle 
below 660 km depth to reach their smallest maxima in the mid- 
mantle before increasing again in the bottom 1000 km, near the 
core-mantle boundary (CMB). This identification of dominantly 
long-wavelength structure near the core-mantle boundary (see also 
Wy session 1996; van der Hilst & Karason 1999 ) is relatively more 
pronounced in the Montelli model than in Ritsema' s. In Mon- 



"bumps" near the CMB, while the corresponding increase in maxi- 
mum structure in Ritsema' s S wave model (Fig. [7};) is more gradual 
and confined mostly to the longest- wavelength scaling functions at 
scale 4 (see also [Wy session et al.|1999| >. 

The maximum values of the expansion coefficients in the 
wavelet basis provide but one part of interpretation of mantle struc- 
ture, thus in Figs. [7]} and|7Jl we plot the percentage- wise relative 
contribution of the wavelet and scaling coefficients at each scale 
to the overall £2 norm of the respective seismic models. These 
curves again reveal the scale and depth dependence of mantle het- 
erogeneity, but now in terms of how much variance is explained by 
each scale at every depth individually: each of the curves sums to 
very nearly 100% at every depth. Their failure to sum to exactly 
100% arises from the preconditioning of the wavelet transforms 
at the edges, which renders even the D4 transforms slightly non- 
orthonormal overall; however, these small (< 1%) deviations are 
not sufficiently important to influence any of the interpretations. In 
this analysis we note that once again the relative contributions to 
model structure are more variable with depth in the Ritsema model 
(Fig. [TJl) than in the Montelli model (Fig. [TJ)), which is particu- 
larly smooth in this regard. In both, however, the importance of the 
structure at scale 3 grows as a function of depth to reach a maxi- 
mum about one third of the way down. This maximum is particu- 
larly well pronounced in Ritsema's model where it is well localized 
at the top of the lower mantle, between 660 km and 1000 km depth. 
The growth of scale 3 structure comes at the expense of scale 4 
structure, suggesting that in that depth range long-wavelength het- 
erogeneity is broken down to smaller scales. 
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Figure 8. Scale lengths of seismic heterogeneity as a function of depth in the Montelli et al. (2 006} P-wave and the |Ritsema et al.| ( p010) S- wave models. 
The calculations are identical to those reported in Fig.[7]but they are now broken per cubed-sphere chunk to reveal geographical variations in seismic mantle 
structure. See Fig.[2]for the numbering scheme used in the legend identifying the colored lines: roughly speaking, 1 corresponds to the Pacific, 2 to Antarctica, 3 
to Asia, 4 to South America, 5 to North America, and 6 to Africa. The relative lack of fine structure at scales 1 and 2 and the less geographically differentiated 
character at scales 3 and 4 of the Ritsema model clearly distinguishes it statistically from the Montelli model. Other features are more persistent between 
models, such as the predominantly large-scale structure near the core-mantle boundary underneath Africa and the Pacific, and the predominantly smaller-scale 
features in the shallow mantle and crust underneath Asia and North America. 



A final window into the Earth's structural heterogeneity as 
well as a useful comparison between models comes in the form of 
Fig. [8] where we are able to deconstruct both of the seismic mod- 
els under consideration on a chunk-by-chunk basis. The (arbitrary 
and thus easily modified) choice we made in Fig.[2]to deviate from 
the canonical Ronchi et al. (1996) orientation of the cubed sphere 
by approximately centering each of the faces on a major continen- 
tal landmass now allows us to study the relative contributions of 
the depth-dependent seismic structure broken down by preponder- 
ant scale length as a function of location in the Earth. Each of the 
curves originally plotted in Figs [7]} and[7Jl degenerates to six indi- 
vidual ones with their own geographical affiliation. The numbering 
scheme is the one introduced in Fig. [2] thus in order of appearance, 
1 corresponds to the Pacific realm, 2 to Antarctica, 3 to most of 
Asia, 4 to South America, 5 to North America and parts of Eura- 
sia, and 6 to Africa, the middle East and the Arabian Peninsula. In 
the computer code that accompanies this paper any other wholesale 
rotation may be applied to the master grid, e.g. to undo the some- 
what unfortunate splitting of Australia over chunks 2 and 3 and of 
Eurasia over chunks 3, 5 and 6. In other words, the cubed-sphere 
wavelet transform may be applied in "detector" mode by rigid rota- 
tion to center on any point of interest. Moreover, provided the scales 
to be analyzed allow it, any geographical portion of the wavelet- 



transformed coefficients may be zeroed out to provide even more 
geographical selectivity without compromise. Such is the power de- 
rived from multi-resolution and scale-space localization under the 
wavelet transform. 

Among other features the results presented in Fig. [8] re- 
veal how the dominantly long- wavelength structure near the core- 
mantle -boundary is mostly due to what lies beneath Africa and the 
Pacific: indeed these are regions that have been long known for 
being the source of various long-wavelength mantle upwellings or 
(super-)plumes (Ni & Helmberger 2003). As to Ritsema' s model, it 
is surprising how little mantle structure is present at the very short- 
est wavelengths of scale 2 (see Figs[5Ji-i) and scale 1 (whose foot- 
print, not shown in Fig [3] is exactly half that of scale 2). While also 
in Montelli' s model the heterogeneity at these scales remains lim- 
ited at the sub-percentage level, there is considerable more energy 
that contributes to the model norm, and there is much more geo- 
graphical variability between chunks in this latter model. The rel- 
ative lack of a geographical signature when comparing Ritsema' s 
to Montelli's model continues to be apparent at the larger scales 3 
and 4; only at scale 4 do both models ascribe mantle structure with 
significant difference to each of the six gross mantle domains. Pre- 
sumably this rather different character between both models is due 
to the data selection and model parameterization: Ritsema' s model 
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contains the effect of the whole-mantle sensitivity of normal-mode 
splitting functions and the spread-out influence of long-period sur- 
face waves. Moreover, this model is derived in terms of global 
spherical harmonics ( Rit sema et al.|2010| ), although the resolution 
gains from including spherical harmonic basis functions to degree 
and order 40 as compared to an earlier iteration of this model ( |Rit-| 
|sema et al. 1 1999] [2554) |Ritsema|2005] > appear modest. Montelli's 
model, in contrast, contains only body-wave observations, albeit 
using finite-frequency sensitivity theory which noticeably "fattens" 
their traditional, ray-theoretical, zone of influence (Montelli et al. 
2004 2006), and it is parameterized on a grid of tetrahedral nodes 
that, while globally distributed throughout the Earth's volume, al- 
lows for more degrees of freedom and hence spatial variability in 
the recovered seismic model. Undoubtedly the scale- and space- 
dependent breakdown of both models is also influenced by the dif- 
ferent choices of damping and smoothing in the inverse problem 
that led to their construction (Boschi & Dziewonski 1999 ). Thus, 
while our analysis cannot claim to uncover the "truth" in charac- 
terizing Earth structure, it does however, endow us with a measure- 
ment tool for the multi- scale dependence of seismic model struc- 
ture. This will serve as a target to reconcile such models with what 
we can learn from forward geodynamical modeling or in their con- 



frontation with mineral physics observations (e.g. Megnin et al. 


1997 


|Piromallo et al.|2001 


, Cammarano et al. 2005 ; Piazzoni et al. 


2007 


|Ritsema et al.|2007l|] 


Bull et al.|2009). 



5 THE INVERSE PROBLEM 

In the previous sections we have constructed a new wavelet trans- 
form on the three-dimensional ball. We have shown that, in a suit- 
ably chosen wavelet basis, Earth models require few significant co- 
efficients. We have used our wavelet scheme to deconstruct two 
tomographic Earth models and evaluated those both for their spar- 
sity and to study the distribution of mantle structure as a function 
of scale, depth, and geographical location. While we have argued 
that we can learn much from such exercises, we have only partially 
reached our end goal, which is to harness the power and perfor- 
mance of spherical wavelet bases to build new seismic tomographic 
models, directly from the data, and which are expected to be sparse 
in such bases. We have not solved any inverse problems yet. In this 
section we explain how these wavelets can be used to do that, too. 

Wavespeed models are constructed from seismic data. With 
respect to a reasonably sized global model parametrization these 
data are incomplete, as seismic stations are mostly concentrated in 
a limited number of regions around the globe — that is, until the 
oceanic arrays of the future ( Simons et al. 2009 ). As usual we shall 
assume that a background velocity model is known, and that our 
goal is to solve the data for a perturbation m(x) to that reference 
model. We may approximate the seismic observations 



K{x) ra(x) d 3 x — 



(8) 



which are of the most general kind described by such integral equa- 
tions and with K any of a veritable plethora of possible kernel 
functions (Nolet 2008 ), by the discretization on the grid defined 
in Section[2] This leads to an inverse problem in matrix form, 



Km 



(9) 



each voxel and the elements of every row of K will be the numeri- 
cal values of the integral of the kernel K(x) over those voxels. 

Eq. {9} remains beholden to the usual assumption of linearity 
in linking the model perturbation m to the data d. Acknowledg- 
ing that the data may be contaminated by (Gaussian) noise n, the 
inverse problem is defined as requiring us to find the best choice 
of m by which to reduce the data misfit: the squared £2 norm 
|| K • m — d||2, to the noise level, ||n|||. Because the data are in- 
complete, the problem is ill-posed and infinitely many such models 
exist. Additional conditions need to be imposed to arrive at a unique 
and physically acceptable solution. This is often done by adding a 
penalty term V(m) to the data misfit, which leads to the functional 



T(m) = ||K ■ m - d||a + V(m) 



(10) 



which is to be minimized. The role of the penalty term is to ensure 
that T has a unique and acceptable minimizer. The trade-off be- 
tween data fit and a priori information is encoded in the penalty V. 
A convenient and often advocated choice for V(m) is a multi- 
ple of the norm-squared of the Laplacian of the model, V(m) — 
A|| V 2 m|||, which favors smoothness in the solutions (see, e.g., 
|Yanovskaya & Ditmar|l 990)1 VanDecar & Snieder|1994] >. The equa- 
tions for the minimum of ^(m) remain linear, 



K T K m + A(V 2 ) T V 2 m = K T • d, 



(11) 



and can thus be handled by standard algorithms. The trade-off pa- 
rameter A needs to be carefully chosen ( Hansen p992| ). 

The novelty now is that we should be able to use model spar- 
sity rather than smoothness as prior information. As discussed in 
Section [3] seismic tomographic models may be very well repre- 
sented by a sparse wavelet expansion. Incorporating this knowledge 
from the start may therefore lead to important benefits to the behav- 
ior of the inversion scheme. In the following we shall assume that 
we have chosen a particular set of wavelet and scaling basis func- 
tions, see Section[2] to represent and build the unknown model. 

With the model ra(x) expanded in our wavelet basis via the 
transform A as in the notation of Section[3] and the individual basis 
functions collected in the columns of a matrix S, the synthesis map, 
the pixel-basis model vector m is 



m — 5(w) = S • w, 



(12) 



with w the vector of expansion coefficients in this basis. In having 
previously defined our construction in terms of a discrete wavelet 
transform we do not need to devise a separate form of discretization 
for each of the many choices of wavelet bases that are available to 
us. In this flexible approach we define the grid size of the cubed 
sphere at the outset and we are thus able to switch between the 
various wavelet bases without much additional effort. As we shall 
remark later on S will usually be provided as a (fast) software algo- 
rithm and not as a matrix per se. We shall also see that the seismic 
inversions only require application of S and its transpose S T . The 
inverse S _1 , the analysis map, is not required to be known — or 
even exist, as is the case for a redundant set of basis functions. 

The sparsity of the model parameters w can now be encour- 
aged by choosing the penalty V to be proportional to the number 
of nonzero entries in w, which we write as ||w||o for short. The 
functional to be minimized then becomes 



where the aim is to reconstruct the model values m from the data 
vector d. The elements of m are the values of the model inside of 



JS(w) = ||K.m-d||! + A||w|| . 

We define the solution to the inverse problem as 

w — argmin (||K • S • w — d|| 2 + A||w|| ) , 

w 

and the reconstructed model is 



(13) 



(14) 



12 Simons, Loris, Nolet, Daubechies & others 



s • w. 



(15) 



The functional in eq. (T4) however is not convex: there exist lo- 
cal minima which makes the minimization much less feasible than 
solving a system of linear equations. Despite this an iterative al- 
gorithm based on hard thresholding exists (Blumensath & Davies 
[2008112009] ), as briefly discussed by |Loris et al.| ( |2010| ). 

An alternative, and computationally much more tractable, 
method for imposing model sparsity in a given basis is to use an 
norm penalty (jDonoho 2006) |Daubechies et al.||2004l |Bruck-| 



stein et al.|2009 ). By identifying ||w||i = ^\ \wi\, and choosing 



V — 2A 1 1 w 1 1 1 for the penalty function, then 
Ti(w) = ||K • S • w - d\\ 2 2 + 2A||w||i 



(16) 



is to be minimized. This functional is convex: a local minimum is 
therefore automatically a global minimum ( |Loris et al.|2007) . 

The functional ( fTSj ) is not differentiable but because T\ is the 
sum of a differentiable and a separable non-differentiable part, con- 
vex optimization techniques can find w = arg min w ^(w) and 
the corresponding model (T5] > with reasonable efficiency. Indeed 
the iteration 



)] 



(17a) 
(17b) 



W n + l - U[\Vn + /3 n (w n - W n -l)] , 

W(w) = T a \ [w + a S T • K T • (d - K • S • w 

converges to the minimizer of < fT6] > ? as shown by |Beck & Teb oulle 
(2009). Hereby T a \ now stands for "soft" thresholding (Mai 
lat 2008 ) of the coefficients on a component-by-component ba- 
sis, which is to say 7^(w) = for |w| < r, and 7^(w) = 
w — rsgn(w) for |w| > r. This is a non-linear operation. The 
parameter a in eq. (17b) can be chosen as the reciprocal of the 



largest eigenvalue of S • K • K • S. We choose to = 1 and 



fin — (t n — l)/£n+l, 
tn + 1 (1 + VT+4^)/2. 



(17c) 
(17d) 



A non-iterative direct algorithm also exists (? |Loris|2008) , but be- 
cause of the large problem sizes typically encountered in seismic 
tomography, we focus here on this so-called Fast Iterative Soft 
Thresholding Algorithm (FISTA). It has an 1/n 2 rate of conver- 
gence to .Fi(w), which is in a sense optimal. The algorithm (IT) 
was used by |Loris et ah] ( |2010| ) on a 3D toy tomographic model. 
There is however a typographical error in that work, which missed 
the factor 4 under the square root in eq. (17d) . 

The iterative algorithm (T7) requires only two linear maps, and 
their transposes. First there is the linear map from model to data 
space, given by the matrix K in eq. (9]). The second is the linear 
map S from model parameters to model space, eq. (T2) . This map 
is typically available in the form of a (fast) algorithm, in casu the 
inverse wavelet transform S, rather than explicitly in matrix form. 
Each iteration step of algorithm (IT) requires one application of K, 



K T , S and S T each. Eqs < |17af[17d] > demonstrate that the iterative 
inversion algorithm does not require the inverse of the map S, much 
as it does not require the inverse of K. Moreover, neither S nor K 
need be invertible. As already mentioned, this means that a model 
may be represented by a sparse superposition of a redundant set of 
functions in which the expansion of the model is no longer unique. 
For example, redundant dual-tree wavelets were used in a synthetic 
tomography experiment by |Loris et al. 1(2007) . 

In practice it turns out to be easier to keep K • S and S T • K T 
in eq. (17b) in factorized form. One can then easily switch bases by 
modifying S and rerunning the inversion algorithm. No new matrix 
K • S needs to be pre-computed, which is important given that K 




Figure 9. Aerial view showing our second adaptation of the cubed sphere 
of Ronchi et al. ( 1996 1. The black lines identify the boundaries of the six 
chunks that were apparent also in Fig.^ The blue lines correspond to the 
boundaries of the overlapping "superchunks" as discussed in the text. 



may have several hundreds of thousands of rows. This is particu- 
larly useful in the case of sparse reconstructions, where the choice 
of basis itself is one of the undetermined factors to be assessed by 
simulation during the inversion: Section [3] made it clear that using 
model sparsity as a priori information depends on the details of the 
basis used. Setting up the inversion software in this manner is there- 
fore forward-looking as new transforms can easily be incorporated 
later. Examples of emerging techniques that can be evaluated in this 
context are curvelets and shearlets (Candes et al. 2005 ; Labate et al. 
2005 ; Easley et al. 2008 ), which offer better directional sensitivity 
than classical wavelet transforms but are redundant. The described 
flexibility of our approach was a major design requirement and will 
yield many dividends in future applications. 



6 A SECOND CONSTRUCTION 

In principle we are now ready to apply the first family of wavelet 
constructions on the cubed sphere that we introduced in Section[2]to 
the inverse problem in the manner outlined in Section [5] As shown 
in Section |3] we expect our solutions to be sparse, and as discussed 
in Section [4] we will be able to use this sparsity and the location- 
and scale-dependence of the results to make geophysical inference 
about the structure of seismic heterogeneity in the Earth. 

As we recall, our First Construction entailed defining wavelet 
and scaling functions on a single chunk £,77 G [— 7r/4,7r/4] and 
then mapping them onto the sphere using eq. (TJ. By this definition 
the basis functions live on a single chunk. Without the modifica- 
tions and preconditioning of the basis at the boundaries between 
the chunks that we introduced, sharp discontinuities in the behavior 
of the coefficients occur at the chunk edges; making the transforms 
edge-cognizant, as we did in the manner of Cohen et al. (1993]), re- 
quired special tailoring of the transforms. This is often cumbersome 
and in general harms our stated goal of keeping our procedure flex- 
ible enough to be able to switch from one wavelet family to another 
which might be more suitable with hindsight. In addition, the inter- 
val wavelet transforms that we used so far are not norm-preserving. 
Extensive experimentation with such bases revealed that despite 
their qualities in the representation of geophysical functions, i.e. in 
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Figure 10. Top: Six superchunks with a number of wavelet and scaling 
functions defined on them. Bottom: The same functions mapped to the 
sphere by the procedure described in Section [6] They are smooth every- 
where. 



performing the forward mappings, when used for the inverse prob- 
lem the solutions obtained using eq. (Tf\ were plagued by unsightly 
artifacts at the seams between chunks (not shown). Presumably the 
i\ -thresholding could be adapted locally to counter this effect, but 
to be truly practical we should not have to resort to this. We thus 
desire a mechanism to map any localized basis function defined on 
a Cartesian grid to the sphere, with smoothness even across chunk 
boundaries. Here we present a straightforward, universal method 
that accomplishes this. 

As opposed to the geometry of the |Ronchi et al.| {T996) cubed 
sphere shown in Fig. [T| we now cover the sphere with six larger 
chunks, by extending the coordinates by 50% on each chunk, to 
i, fj G [— 37r/8, 37r/8], see Fig. [9] We shall refer to these partially 
overlapping domains as "superchunks". In (£, fj) coordinates they 
are simply six large squares (rather: cubes if we take the radial di- 
rection into account also), with the "original" chunks at their cen- 
ters. Functions defined on this central part can now smoothly cross 
into the outer part, that is, they are allowed to spill over into an- 
other chunk while staying in the same superchunk. Fig. [To] shows a 
selection of examples where this is the case. The smoothness of the 
functions across the boundaries is apparent, though we note that if 
we were to plot them in the manner in which Fig. [2] was presented, 
they would appear to have kinks in them; this is simply because the 
coordinate transform of eq. (T} itself is non- smooth. 

To map a function defined on a single superchunk k = 1 — » 6 
to the corresponding chunk and its neighbors, one loops over all 
the voxels in this central chunk and its four neighbors. The center 
(£, 77, r, k) of each such voxel is mapped to (x, y, z) coordinates 
using formula {l}. In the same way as eq. {2} we then calculate 



[atan(z/2/), atan(— x/y) ] 


if 


k 


= 1, 


[atan(y/x), atan(— z/x) ] 


if 


k 


= 2, 


[atan(— y/z), atan(x/z) ] 


if 


k 


= 3, 


[atan(x/z), atan(— y/z) ] 


if 


k 


= 4, 


[atan(— z/x), atan(y/x) ] 


if 


k 


= 5, 


[atan(— x/y), atan(z/y) ] 


if 


k 


= 6, 



(18) 



to convert these (x,y,z) to the (£,fj,f — r) coordinates in the 
superchunk k, limited to — 3tv/8 < £,fj < 3tv/8. This then deter- 
mines which voxel in the superchunk is mapped to the voxel in the 
original chunk. The index of the voxel in each superchunk k is 



i = l + 



(z 3tt\ 2N I . i I / 3tt\ 2N I 

where N is the number of voxels in the £ and r\ directions of a 
chunk and [J indicates rounding down. Voxel indices run from 
1 — > N in an original chunk and from 1 — > 3N/2 in a super- 
chunk. The central part of a superchunk is a copy of the original 
chunk, whereas the voxels outside the center of a superchunk are 
mapped to neighboring chunks. As the superchunks partially over- 
lap, a chunk voxel on the sphere may receive contributions from up 
to three superchunks: a voxel near a chunk corner may receive three 
contributions, a voxel near a chunk edge may receive two, and vox- 
els near chunk centers only one. The identifications are most easily 
made by table look-up. 

In Fig. [TO] we show a number of wavelet functions from this 
Second Construction at a variety of locations. These now map 
smoothly to the sphere. The wavelets shown here are from the |Co-| 
|hen et al.| ( [T992t CDF 4-2 wavelet family, as in Figure [4] These 
are mirror- symmetric in the (£,77) domain, but they are no longer 
orthogonal. As in Section [2] the wavelets at a fixed scale are not 
rotations of each other on the sphere, but rather translates in the 
superchunk (£, fj) domain. This effect is most noticeable for the 
wavelet and scaling functions that are located on or near chunk 
edges, specifically near the corners. Basis functions that have the 
same norm in the superchunk domain may not have the same norm 
in the chunk domain. 



7 NUMERICAL EXPERIMENTS 

We consider a set of great-circle paths that is a global collection 
of 2469 earthquakes and 199 stations yielding 8490 surface- wave 
paths, a situation based on, if not identical to, the ray path cover- 
age in the models of Rayleigh-wave phase speeds at 80 s period 
obtained by |Trampert & Woodhouse| ( |1995|[T^96l|2001| >. For sim- 
plicity we convert this path coverage to the ray-theoretical values of 
arrival-time sensitivity expressed in our model domain. The image 
in Fig. pT] (top panel) renders all rays in this data set of realistically 
heterogeneous global seismic sensitivity. For synthetic input model 
we chose a single interval of the Montelli et al. (2006 ) model cen- 
tered on 722 km depth, shown in Fig. [TT] (second from the top). In 
addition, and this is an admitted departure from realism, we select 
four circular regions of null structure. Their purpose is to test the 
inversion algorithm and the choice of basis when sharp wavespeed 
contrasts are known to be present in the true model. We calculate 
the travel-time perturbations over these 8490 ray paths and add 
Gaussian noise to them with an rms value that is 10% of that of 
the rms of the data. The variance of this noise is denoted a 2 . 

The reconstruction is by the algorithm (T7] > using the four- 
level CDF 4-2 wavelets under the Second Construction by which 
smooth chunk crossings were enabled, as shown in Fig.[l0| In keep- 
ing with the description of Section [5] the dual aim is to satisfy the 
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noisy data in the traditional £2 sense while favoring a model that is 
sparse in the wavelet basis by minimizing the £\ norm of the coeffi- 
cients. Fig. [TT] (third from the top) shows the obtained solution. Of 
the 6 x 128 == 98304 degrees of freedom in this parameterization 
the algorithm terminates on a model with 1670 nonzeroes. Due to 
lack of data, the relative output error is high: 33.5%. 

The behavior of the solution through the 1000 iterations is 
shown in Fig. [TT] (bottom panel), which plots the £\ norm of the 
wavelet coefficients against a measure of the evolving misfit calcu- 
lated as the "reduced chi- square" 



N 



^ 2 l|d||o 



||d-K-m||i 

^ 2 l|d||o ' 



(20) 



in other words, the squared £2 norm of the data misfit normalized 
by the noise variance and the total number of data constraints, for 
which it is reasonable to assume ( Nolet 2008 ) that it is distributed 
as a xi variable. The starting point of the iteration is marked by the 
filled white circle, and the final solution by the filled white square, 
which is arrived at when the % 2 /N variable in eq. |20| l reaches its 
expectation 1. Every one of the 1000 solutions in the sequence is 
marked by a black cross. As we notice the algorithm |T7| > rapidly 
reduces the data misfit in the first few steps, slowing down after 
that, at the same time increasing the sum of the absolute values 
of the wavelet coefficients. After "turning a corner" in this space, 
the remainder of the time spent is in reducing the £1 norm of the 
coefficients of the solution while slowly converging to the target 
reduced chi-square of x 2 /N — 1. 

The solution is very good; the input model is well matched and 
the leakage of the solution into the areas where no structure should 
be recovered is relatively minor. The global map views in Fig. (TT) 
had all values that fall below a threshold of l/20 th of their maxi- 
mum absolute value rendered white for visual guidance, and they 
receive several additional annotations for us to be able to judge the 
quality of the solution quantitatively. The minimum and maximum 
values of the models are quoted in the top left corner, and in the top 
right corner we show their mean and the root mean squared values. 
The four sets of numbers in the bottom left and right hand corners 
quote these same metrics for the areas contained inside of the cir- 
cular areas. As the input model has no structure there, all of these 
are zeroes. This is no longer the case for the output which serves as 
our way to evaluate the leakage of the solution into those areas. As 
we can see the comparison is very favorable. 

While we have conducted numerous synthetic tests with a 
multitude of synthetic input models (including checkerboard tests, 
Gaussian shaped anomalies positioned at various locations, and us- 
ing a variety of ray path coverages), only one of these tests is re- 
ported here. As noted by Loris et al. ( 2007 2010 ) there are more 
algorithms available to us than the one described in eq. {TT]), and as 
we have argued in this paper there is a plethora of wavelet construc- 
tions that can be brought to bear on the inverse problem of global 
seismic tomography. All of these alternatives remain in principle 
candidates to be implemented using our First or Second Construc- 
tion for wavelets on the sphere. A more detailed comparison of 
their relative performance is well underway and will be reported in 
forthcoming work. It is there also that we will fully integrate the 
third dimension into our formalism. Conceptually, there is no diffi- 
culty in doing this: we have transformed the ball of the Earth into 
six independent or partially overlapping Cartesian model domains 
with three separable coordinates. Taking into account the depth di- 
mension merely involves applying a third wavelet transform to the 
result of the transform in the two angular coordinates, but as there 
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Figure 11. Synthetic experiment under realistic conditions with a twist, 
illustrating the recovery of a seismic tomographic model with artificially 
introduced "blank spots" from noisy data, using the Second Construction 
discussed in Section [6] and the iterative algorithm of Section [5] The solu- 
tion after one iteration is represented by the filled white circle in the bottom 
panel; the solution after 1000 iterations by the filled white square. 
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remain choices to be made, a thorough discussion remains outside 
the scope of the current paper. 



8 CONCLUSIONS 

Until now, seismic wavespeed models of the Earth have been rou- 
tinely parameterized in terms of spherical harmonics, networks of 
tetrahedral nodes, rectangular voxels, or spherical splines. How- 
ever, there were few approaches to Earth model parameterization 
by wavelets on the three-dimensional ball. To the rich field of 
wavelets on the ball or its surface, the sphere, we have contributed 
two new flexible constructions that are eminently suited to solve 
seismological tomographic inverse problems. 

To form the numerical grid we considered a surface tessela- 
tion known as the "cubed sphere", popular in fluid dynamics and 
computational seismology, which can be combined with a (semi- 
regular) radial subdivision. This mapping transforms the entire vol- 
ume of the mantle into six portions. In the new variables, these 
"chunks" correspond to rectangular boxes with Cartesian coordi- 
nates. Standard algorithms can then be used to perform the wavelet 
transformation (or any other) in each of the six bounded volumes. 
We developed two possible classes of discrete wavelet transforms 
in the angular dimension of the cubed sphere. One relies on pre- 
conditioning and special boundary filters to account for the edges 
separating the chunks; another one broadens the definition of the 
cubed sphere to include chunks that partially overlap, on which we 
implement standard wavelet transforms. 

Much has been gained by our design of procedures that effi- 
ciently parameterize the seismological inverse problem. First, the 
multiresolution character of a wavelet basis allows for the models 
to be represented with an effective spatial resolution that varies as 
a function of position within the Earth. Second, inversion schemes 
that are formulated in terms of wavelets can exploit recent theo- 
retical and numerical advances by which the most sparse solution 
vector, in wavelet space, is found through iterative minimization of 
a combination of the £2 (to fit the data) and £\ norms (to promote 
sparsity in wavelet space). 

In preparation of the continuing increase in high-quality seis- 
mic data that is expected in the decades to come, our focus has also 
been on numerical efficiency and the ability to use parallel com- 
puting in constructing the model. We have shown how our seis- 
mic model representation behaves under progressive thresholding 
of the wavelet coefficients, and how the geographically distributed 
power of published seismic models varies over the scale lengths 
that can be independently resolved. Synthetic tests under realistic 
conditions validates the approach that we advocate for the future 
of seismic tomography, which shows the ability to explain hetero- 
geneous, massive data sets under the constraint that the best-fitting 
models should also be sparse in the wavelet bases used. 
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